首页> 外文OA文献 >Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering

【2h】

Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering

机译：在VQa中构建V：提升图像理解的作用视觉问题回答

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
相似文献
相关主题

摘要

Problems at the intersection of vision and language are of significantimportance both as challenging research questions and for the rich set ofapplications they enable. However, inherent structure in our world and bias inour language tend to be a simpler signal for learning than visual modalities,resulting in models that ignore visual information, leading to an inflatedsense of their capability. We propose to counter these language priors for the task of Visual QuestionAnswering (VQA) and make vision (the V in VQA) matter! Specifically, we balancethe popular VQA dataset by collecting complementary images such that everyquestion in our balanced dataset is associated with not just a single image,but rather a pair of similar images that result in two different answers to thequestion. Our dataset is by construction more balanced than the original VQAdataset and has approximately twice the number of image-question pairs. Ourcomplete balanced dataset is publicly available at www.visualqa.org as part ofthe 2nd iteration of the Visual Question Answering Dataset and Challenge (VQAv2.0). We further benchmark a number of state-of-art VQA models on our balanceddataset. All models perform significantly worse on our balanced dataset,suggesting that these models have indeed learned to exploit language priors.This finding provides the first concrete empirical evidence for what seems tobe a qualitative sense among practitioners. Finally, our data collection protocol for identifying complementary imagesenables us to develop a novel interpretable model, which in addition toproviding an answer to the given (image, question) pair, also provides acounter-example based explanation. Specifically, it identifies an image that issimilar to the original image, but it believes has a different answer to thesame question. This can help in building trust for machines among their users.

机译：视觉和语言交汇处的问题作为具有挑战性的研究问题以及它们所支持的丰富应用程序都具有重要意义。但是，与视觉模态相比，我们世界的固有结构和偏向语言的语言更容易成为学习的信号，从而导致模型忽略视觉信息，从而导致其能力膨胀。我们建议针对视觉问题答案（VQA）的任务反对这些语言先验知识，并使视觉（VQA中的V）至关重要！具体来说，我们通过收集互补图像来平衡流行的VQA数据集，以使平衡数据集中的每个问题不仅与单个图像相关，而且与一对相似的图像相关联，从而导致对该问题有两个不同的答案。通过构造，我们的数据集比原始VQA数据集更平衡，并且具有大约两倍的图像问题对数量。我们完整的平衡数据集是可视问题回答数据集和质询（VQAv2.0）第二次迭代的一部分，可在www.visualqa.org上公开获得。我们进一步在平衡数据集上对许多最新的VQA模型进行了基准测试。所有模型在我们平衡的数据集上的表现都明显较差，这表明这些模型确实已学会了先验语言。此发现为从业人员似乎具有定性意义提供了第一个具体的经验证据。最后，我们用于识别互补图像的数据收集协议使我们能够开发一种新颖的可解释模型，该模型除了提供对给定（图像，问题）对的答案外，还提供了基于反例的解释。具体来说，它识别出与原始图像相似的图像，但是它认为对同一问题有不同的答案。这有助于在用户之间建立对计算机的信任。

著录项

作者
Goyal, Yash; Khot, Tejas; Summers-Stay, Douglas; Batra, Dhruv; Parikh, Devi;
展开▼
作者单位

展开▼
年度 2017
总页数
原文格式 PDF
正文语种
中图分类

相似文献

外文文献
中文文献
专利

1. Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering [J] . Goyal Yash, Khot Tejas, Agrawal Aishwarya, International Journal of Computer Vision . 2019,第4期

机译：在VQA问题中制作v：提升图像理解在视觉问题的回答中的作用
2. R-VQA: Learning Visual Relation Facts with Semantic Attention for Visual Question Answering [J] . Pan Lu, Lei Ji, Wei Zhang, SIGKDD explorations . 2018,第Udisk期

机译：R-VQA：学习具有语义关注的视觉关系事实，用于视觉问题应答
3. Inverse Visual Question Answering: A New Benchmark and VQA Diagnosis Tool [J] . IEEE Transactions on Pattern Analysis and Machine Intelligence . 2020,第2期

机译：视觉反问题解答：一种新的基准和VQA诊断工具
4. Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering [C] . Yash Goyal, Tejas Khot, Douglas Summers-Stay, IEEE Conference on Computer Vision and Pattern Recognition . 2017

机译：在VQA中使V变得重要：提升视觉理解中图像理解的作用
5. Context Based Multi-Image Visual Question Answering (VQA) in Deep Learning [D] . Peddinti, Sudhakar Reddy. 2018

机译：深度学习中基于上下文的多图像视觉问答（VQA）
6. A dataset of clinically generated visual questions and answers about radiology images [O] . Jason J. Lau, Soumya Gayen, Asma Ben Abacha, 2018

机译：临床产生的有关放射影像的视觉问题和答案的数据集
7. CQ-VQA: Visual Question Answering on Categorized Questions [O] . Aakansha Mishra, Ashish Anand, Prithwijit Guha 2020

机译：CQ-VQA：在分类问题上应答的视觉问题
8. Fundamental hydrogen transfer studies in coal liquefaction: Understanding the answers and questions [R] . Franz, J. A. , Camaioni, D. M. , Alnajjar, M. S. , 1995

机译：煤液化的基本氢转移研究：了解答案和问题

Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering

摘要

著录项

相似文献

相关主题

期刊订阅